It Works On My Machine - Reproducibility in R for Small Teams

Works on my machine - MS Teams sticker Melissa Albino Hegeman, October 19, 2023

Disclaimer

  • I work for NYSDEC, but the opinions I’m presenting are my own and don’t reflect agency policy.

  • Images were generated with Adobe FireFly

About Me

  • Marine biologist

  • Get really seasick

  • Work with fisheries data

How I Am Using R

  • Automate routine tasks
  • Generate individualized reports

What is a Team?

  • 10 people or less
  • Limited to no experience with R
  • No enterprise tools

Sharing the Load

  • What happens after you implement a big change?
  • Who is responsible for maintenance?
  • Who is responsible for new features?

Setting Up for Success

  • R projects
  • GitHub
  • Custom R package
  • {renv}

It Works on My Machine

  • Error in library(tidyverse) : there is no package called ‘tidyverse’

  • Error in plot(data) : object 'data' not found

  • Error in file(file_path, "r") : cannot open the connection

  • cannot create dir 'output', permission denied

How Do I Fix It?

  • R projects
  • GitHub
  • Custom R package
  • {renv}

R Projects

Issues

  • Not being used as intended

Successes

  • Relative file paths
  • First step in reproducibility
  • Adds portability

GitHub Repositories

Issues

  • Steep learning curve

Success

  • It’s the most efficient way to get the code on everyone’s machine

  • Gives team members the freedom to experiment

Custom R Package

Issues

  • Keeping everything in sync

  • Updates and maintenance

Successes

  • Everyone is applying the same treatment to the data no matter where they are working

  • Dependent on staff making sure their version was updated regularly

{renv}

Issues

  • Slow to boot up a project for the first time

  • Staff were updating the lockfile rather than adjusting their installed packages when their project was out of sync

Successes

  • This is still a work in progress

  • Wait until you are done developing before you initialize {renv}

  • Forced me to minimize the amount of dependencies I rely on

Solutions

  • Consistent and continued training for new staff

  • Simplify where ever you can

  • Avoid scope creep in your projects

Next Steps

::: notes

Everything I’ve talked about so far is from the perspective of a small group without significant IT resources to throw around. However, there is one strategy that’ve I’ve only really associated with larger programs: [advance] containers, such as docker or podman. I had been afraid to try this, it seemed really intimidating. But honestly, I’ve been experimenting with containers and it hasn’t been any more frustrating than everything that I’ve already talked about. Having this much control over the development environment is really appealing. The examples I’ve talked about to this point involved people running code already written, but everyone needs to be working toward developing new features on their own, and having a consistent environment from the start should help avoid some of the biggest roadblocks to getting started. When your trying to lead your group in a certain direction, it is really important to give them successes early on.

Thank You

Melissa Albino Hegeman

https://github.com/mhegeman/2023_rgov